DATA 601: Project

DATA Analysis of GDP Growth impact on the Greenhouse gas emissions

Maruthi Kumar Mutnuri

Fall 2019

Due: Wed. Oct. 16, 2019

1. Introduction:

Post-industrialization era has been marked with increased GDP growth for countries, as well as an improved standard of living for the people. However, throughout the past years, people have not realized the importance of climate and the implications of climatic changes on our society. With increasing globalization, drastic visible climatic changes – and their impact on society— have been observed that has led to people realizing the gravity of the situation at hand. Globalization leads to countries increasing their production in order to meet the local demand as well as the international demand. Such increased global demands have led to countries using more fossil fuel to run the machinery, which has led to increased air pollution that in-turn has started affecting our ecosystem and health of the people.
Many of the sources of outdoor air pollution are also sources of high CO2 emissions. A major source of carbon dioxide is the burning of fossil fuel—mostly by the energy and transport sectors. In regions that are prone to temperature and precipitation pattern changes due to climate change, it is very much likely that the frequency and severity of forest fires will also increase, destroying the ecosystem and the habitants, while simultaneously releasing more air pollutants.[1] Household air pollution from cooking with solid fuels has accounted for 3.8 millions deaths in the year 2016.[2] Similar to household pollution, outdoor air pollution in both cities and rural areas was estimated to cause 4.2 million premature deaths worldwide in the year 2016.[3]

[1] “Ambient air pollution: Health impacts”, World Health Organization. https://www.who.int/airpollution/ambient/health-impacts/en/

[2] “Mortality from household air pollution”, World Health Organization, 2016. _https://www.who.int/gho/phe/indoor_air_pollution/burden/en/_

[3] “Ambient Air Quality and Health”, World Health Organization, 2018. https://www.who.int/en/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health

Importing libraries for the project

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly as plotly
import plotly.offline as py
import plotly.graph_objs as go
import plotly.express as px

#%matplotlib inline

2. Dataset:

We are using six datasets for our study, annual-co2-emissions-per-country, fossil-fuel-consumption-by-fuel-type, greenhouse-gas-emissions-by-gas, maddison-data-gdp-per-capita-in-2011us, modern-renewable-energy-consumption, and CO2-by-source.
  1. annual-co-emissions-by-region: This dataset has data on the Annual CO2 emissions in tonnes by entity, code (ISO-code) and year. Country column is comprised of world, continents and regions along with all the countries.
  2. fossil-fuel-consumption-by-fuel-type: This dataset has data on Fossil fuel consumption namely Oil, Gas and Coal in terawatt-hours by entity, code (ISO-code) and year. Entity is comprised of world, continents and regions along with all the countries.
  3. greenhouse-gas-emissions-by-gas: This dataset has data on various greenhouse gases namely SF₆ gases, PFC gases, HFC gases, Nitrous oxide (N₂O), Methane (CH₄), Carbon Dioxide (CO₂) in tonnes by entity, code (ISO-code) and year. Entity is comprised of world, continents and regions along with all the countries.
  4. maddison-data-gdp-per-capita-in-2011us: This dataset has data on GDP in 2011 USD by entity, code (ISO-code) and year. Entity is comprised of world, continents and regions along with all the countries.
  5. modern-renewable-energy-consumption: This dataset has data on various renewable energy consumption namely Solar, Wind, Hydro, Other renewables (modern biofuels; geothermal; wave & tidal) in terawatt-hours by entity, code (ISO-code) and year. Entity is comprised of world, continents and regions along with all the countries.
  6. CO2-by-source: This dataset has data on CO2 emissions by source namely Oil, Gas and Coal in terawatt-hours in tonnes by entity, code (ISO-code) and year. Entity is comprised of world, continents and regions along with all the countries.
Data set is obtained from open source
Our World in Data is a public good [4] : 

This is why all the work we ever do is made available in its entirety as a public good

  • Visualizations and text are licensed under CC BY and may be freely used for any purpose.

  • The data is available for download.

  • And all code we write is open-sourced under the MIT license and can be found on GitHub.

[4] https://ourworldindata.org/about

Loading datasets

In [2]:
CO2emissions = pd.read_csv("annual-co2-emissions-per-country.csv")
FossilFuelConsumption = pd.read_csv("fossil-fuel-consumption-by-fuel-type.csv")
GreenGasEmissions = pd.read_csv("greenhouse-gas-emissions-by-gas.csv")
GDPpercapita = pd.read_csv("maddison-data-gdp-per-capita-in-2011us.csv")
RenewableEnergyConsumtion = pd.read_csv("modern-renewable-energy-consumption.csv")
CO2bySource = pd.read_csv("CO2-by-source.csv")

Displaying raw data of all data sets before any data filtering or data wrangling

In [3]:
CO2emissions.head()
Out[3]:
Entity Code Year Annual CO₂ emissions (tonnes)
0 Afghanistan AFG 1949 14656.0
1 Afghanistan AFG 1950 84272.0
2 Afghanistan AFG 1951 91600.0
3 Afghanistan AFG 1952 91600.0
4 Afghanistan AFG 1953 106256.0
In [4]:
FossilFuelConsumption.head()
Out[4]:
Entity Code Year Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
0 Africa NaN 1965 316.777300 10.552653 335.590757
1 Africa NaN 1966 347.368863 11.788922 331.371978
2 Africa NaN 1967 344.129668 11.660342 341.010827
3 Africa NaN 1968 363.142507 11.901662 355.407259
4 Africa NaN 1969 368.629932 14.177457 357.506395
In [5]:
GreenGasEmissions.head()
Out[5]:
Entity Code Year SF₆ gases (tonnes) PFC gases (tonnes) HFC gases (tonnes) Nitrous oxide (N₂O) (tonnes) Methane (CH₄) (tonnes) Carbon Dioxide (CO₂) (tonnes)
0 Afghanistan AFG 1960 NaN NaN NaN NaN NaN 414371.0
1 Afghanistan AFG 1961 NaN NaN NaN NaN NaN 491378.0
2 Afghanistan AFG 1962 NaN NaN NaN NaN NaN 689396.0
3 Afghanistan AFG 1963 NaN NaN NaN NaN NaN 707731.0
4 Afghanistan AFG 1964 NaN NaN NaN NaN NaN 839743.0

Note: All countries GDP is in 2011 USD

In [6]:
GDPpercapita.head()
Out[6]:
Entity Code Year GDP per capita (int.-$) ($)
0 Afghanistan AFG 1950 2392
1 Afghanistan AFG 1951 2422
2 Afghanistan AFG 1952 2462
3 Afghanistan AFG 1953 2568
4 Afghanistan AFG 1954 2576
In [7]:
RenewableEnergyConsumtion.head()
Out[7]:
Entity Code Year Hydropower (terawatt-hours) Wind (terawatt-hours) Solar (terawatt-hours) Other renewables (modern biofuels; geothermal; wave & tidal) (terawatt-hours)
0 Africa NaN 1965 14.278806 0.0 0.0 0.0
1 Africa NaN 1966 15.649049 0.0 0.0 0.0
2 Africa NaN 1967 16.158333 0.0 0.0 0.0
3 Africa NaN 1968 18.622983 0.0 0.0 0.0
4 Africa NaN 1969 21.582897 0.0 0.0 0.0
In [8]:
CO2bySource.head()
Out[8]:
Entity Code Year Cement (tonnes) Flaring (tonnes) Oil (tonnes) Coal (tonnes) Gas (tonnes)
0 Afghanistan AFG 1949 0 0.0 0 14656 0.0
1 Afghanistan AFG 1950 0 0.0 65952 21984 0.0
2 Afghanistan AFG 1951 0 0.0 65952 25648 0.0
3 Afghanistan AFG 1952 0 0.0 62288 32976 0.0
4 Afghanistan AFG 1953 0 0.0 65952 36640 0.0

3. Data Wrangling:

We are using various data wrangling steps like filtering, slicing, joins, transformation, sorting and interpolating to create a clean dataframe ready for visualization.

3.1. Renaming columns

In [9]:
CO2emissions.rename(columns={"Entity": "Country"}, inplace = True)
FossilFuelConsumption.rename(columns={"Entity": "Country"}, inplace = True)
GreenGasEmissions.rename(columns={"Entity": "Country"}, inplace = True)
GDPpercapita.rename(columns={"Entity": "Country"}, inplace = True)
GDPpercapita.rename(columns={"GDP per capita (int.-$) ($)": "GDP per capita (2011USD)"}, inplace = True)
RenewableEnergyConsumtion.rename(columns={"Entity": "Country"}, inplace = True)
CO2bySource.rename(columns={"Entity": "Country"}, inplace = True)

Note: we will be looking at the G20 countries from 1985 to 2015. Our parameter for choosing the G20 countries is based on the fact that these countries contribute to 80% of the world trade with high GDPs and high consumption of fossil fuels. We will discuss this further in the guiding questions section.

3.2. Slicing the data for greater than or equal to year 1985. Since we have good data in all datasets after 1985.

In [10]:
CO2emissions = CO2emissions[CO2emissions.Year >= 1985]
FossilFuelConsumption = FossilFuelConsumption[FossilFuelConsumption.Year >= 1985]
GreenGasEmissions = GreenGasEmissions[GreenGasEmissions.Year >= 1985]
GDPpercapita = GDPpercapita[GDPpercapita.Year >= 1985]
RenewableEnergyConsumtion = RenewableEnergyConsumtion[RenewableEnergyConsumtion.Year >= 1985]
CO2bySource = CO2bySource[CO2bySource.Year >= 1985]
CO2bySource.drop(columns = ['Cement (tonnes)', 'Flaring (tonnes)'], inplace = True)

Note: Dataframe of G20 countries excluding European Union. we are excluding EU due to lack of good data availability in the data sets.

In [11]:
G20 = {'Country': ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany', 'India', 'Indonesia', 'Italy', 'Japan', 'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'South Korea', 'Turkey', 'United Kingdom', 'United States']}
G20Countries = pd.DataFrame(G20)
# G20Countries.columns.names = 'Country'
G20Countries.head()
Out[11]:
Country
0 Argentina
1 Australia
2 Brazil
3 Canada
4 China

3.3. Using Join to slice data by G20 countries

In [12]:
CO2emissionsG20 = pd.merge(CO2emissions, G20Countries, on='Country', how='inner').sort_values(by='Country')
FossilFuelConsumptionG20 = pd.merge(FossilFuelConsumption, G20Countries, on='Country', how='inner').sort_values(by='Country')
GDPpercapitaG20 = pd.merge(GDPpercapita, G20Countries, on='Country', how='inner').sort_values(by='Country')
GreenGasEmissionsG20 = pd.merge(GreenGasEmissions, G20Countries, on='Country', how='inner').sort_values(by='Country')
RenewableEnergyConsumtionG20 = pd.merge(RenewableEnergyConsumtion, G20Countries, on='Country', how='inner').sort_values(by='Country')
CO2bySourceG20 = pd.merge(CO2bySource, G20Countries, on='Country', how='inner').sort_values(by='Country')

3.4. Sorting data by country and year

In [13]:
CO2emissionsG20.sort_values(['Country', 'Year'], inplace = True)
FossilFuelConsumptionG20.sort_values(['Country', 'Year'], inplace = True)
GDPpercapitaG20.sort_values(['Country', 'Year'], inplace = True)
GreenGasEmissionsG20.sort_values(['Country', 'Year'], inplace = True)
RenewableEnergyConsumtionG20.sort_values(['Country', 'Year'], inplace = True)
CO2bySourceG20.sort_values(['Country', 'Year'], inplace = True)

3.5 Displaying datasets after data wrangling

In [14]:
CO2emissionsG20.head()
Out[14]:
Country Code Year Annual CO₂ emissions (tonnes)
0 Argentina ARG 1985 100366195.2
1 Argentina ARG 1986 103933940.3
2 Argentina ARG 1987 114611063.8
3 Argentina ARG 1988 121129121.2
4 Argentina ARG 1989 116802018.4
In [15]:
FossilFuelConsumptionG20.head()
Out[15]:
Country Code Year Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
0 Argentina ARG 1985 217.81827 160.62700 6.990557
1 Argentina ARG 1986 250.64976 184.53950 10.407628
2 Argentina ARG 1987 267.57141 186.74975 11.553994
3 Argentina ARG 1988 267.60630 217.68250 11.993011
4 Argentina ARG 1989 242.68321 232.16600 12.009020
In [16]:
GDPpercapitaG20.head()
Out[16]:
Country Code Year GDP per capita (2011USD)
0 Argentina ARG 1985 12313
1 Argentina ARG 1986 13077
2 Argentina ARG 1987 13276
3 Argentina ARG 1988 12897
4 Argentina ARG 1989 11979
In [17]:
GreenGasEmissionsG20.head()
Out[17]:
Country Code Year SF₆ gases (tonnes) PFC gases (tonnes) HFC gases (tonnes) Nitrous oxide (N₂O) (tonnes) Methane (CH₄) (tonnes) Carbon Dioxide (CO₂) (tonnes)
0 Argentina ARG 1985 NaN NaN NaN 35305280.0 94616900.0 100596811.0
1 Argentina ARG 1986 NaN NaN NaN 35461830.0 94628500.0 104212473.0
2 Argentina ARG 1987 NaN NaN NaN 36097330.0 95248500.0 114942115.0
3 Argentina ARG 1988 NaN NaN NaN 36817150.0 98104200.0 121473042.0
4 Argentina ARG 1989 NaN NaN NaN 35704560.0 100292000.0 117090977.0
In [18]:
RenewableEnergyConsumtionG20.head()
Out[18]:
Country Code Year Hydropower (terawatt-hours) Wind (terawatt-hours) Solar (terawatt-hours) Other renewables (modern biofuels; geothermal; wave & tidal) (terawatt-hours)
0 Argentina ARG 1985 20.656276 0.0 0.0 0.136371
1 Argentina ARG 1986 21.027505 0.0 0.0 0.141051
2 Argentina ARG 1987 21.889288 0.0 0.0 0.148871
3 Argentina ARG 1988 15.803774 0.0 0.0 0.162142
4 Argentina ARG 1989 13.328910 0.0 0.0 0.160424
In [19]:
CO2bySourceG20.head()
Out[19]:
Country Code Year Oil (tonnes) Coal (tonnes) Gas (tonnes)
0 Argentina ARG 1985 60437680 2773648 28810032.0
1 Argentina ARG 1986 62200064 3392864 30964464.0
2 Argentina ARG 1987 66453968 3777584 36698624.0
3 Argentina ARG 1988 66230464 4023072 42231264.0
4 Argentina ARG 1989 59752512 4143984 44609200.0

3.6 Replacing zeros with Nan and then interpolating the data in GreenGasEmissionsG20 data set

In [20]:
GreenGasEmissionsG20.replace(0, np.nan, inplace=True)
GreenGasEmissionsG20 = GreenGasEmissionsG20.groupby('Country').apply(lambda group: group.interpolate(method= 'linear', limit_direction = 'both'))

3.7 Transforming the data - converting the units from tonnes to Million tonnes in CO2emissionsG20 and GreenGasEmissionsG20 data sets

In [21]:
#CO2emissionsG20 and GreenGasEmissionsG20 are in Million tonnes from now onwards
CO2emissionsG20['Annual CO₂ emissions (tonnes)'] = CO2emissionsG20['Annual CO₂ emissions (tonnes)']/1000000 #Million tonnes
GreenGasEmissionsG20['SF₆ gases (tonnes)'] = GreenGasEmissionsG20['SF₆ gases (tonnes)']/1000000 
GreenGasEmissionsG20['PFC gases (tonnes)'] = GreenGasEmissionsG20['PFC gases (tonnes)']/1000000
GreenGasEmissionsG20['HFC gases (tonnes)'] = GreenGasEmissionsG20['HFC gases (tonnes)']/1000000
GreenGasEmissionsG20['Nitrous oxide (N₂O) (tonnes)'] = GreenGasEmissionsG20['Nitrous oxide (N₂O) (tonnes)']/1000000
GreenGasEmissionsG20['Methane (CH₄) (tonnes)'] = GreenGasEmissionsG20['Methane (CH₄) (tonnes)']/1000000
GreenGasEmissionsG20['Carbon Dioxide (CO₂) (tonnes)'] = GreenGasEmissionsG20['Carbon Dioxide (CO₂) (tonnes)']/1000000

3.8 Renaming the columns in CO2emissionsG20 and GreenGasEmissionsG20 data set to reflect the Unit Million tonnes

In [22]:
CO2emissionsG20.rename(columns={"Annual CO₂ emissions (tonnes)": "Annual CO₂ emissions (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"SF₆ gases (tonnes)": "SF₆ gases (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"PFC gases (tonnes)": "PFC gases (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"HFC gases (tonnes)": "HFC gases (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"Nitrous oxide (N₂O) (tonnes)": "Nitrous oxide (N₂O) (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"Methane (CH₄) (tonnes)": "Methane (CH₄) (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.rename(columns={"Carbon Dioxide (CO₂) (tonnes)": "Carbon Dioxide (CO₂) (Million tonnes)"}, inplace = True)
In [23]:
CO2emissionsG20.head()
Out[23]:
Country Code Year Annual CO₂ emissions (Million tonnes)
0 Argentina ARG 1985 100.366195
1 Argentina ARG 1986 103.933940
2 Argentina ARG 1987 114.611064
3 Argentina ARG 1988 121.129121
4 Argentina ARG 1989 116.802018

3.9 Creating an Index for GreenGasEmissionsG20 dataset and droping the columns Country and Year

In [24]:
GreenGasEmissionsG20.index = [GreenGasEmissionsG20.Country, GreenGasEmissionsG20.Year]
GreenGasEmissionsG20_1 = GreenGasEmissionsG20.drop(columns = 'Code')
GreenGasEmissionsG20.drop(columns = ['Country', 'Code', 'Year'], inplace = True)

3.10. Stacking and resetting index to get the data in linear array from matrix. Then renaming the column heading

In [25]:
GreenGasEmissionsG20 = GreenGasEmissionsG20.stack().reset_index()
GreenGasEmissionsG20.rename(columns={"level_2": "Green House Gases"}, inplace = True)
GreenGasEmissionsG20.rename(columns={0: "GreenGasEmissions (Million tonnes)"}, inplace = True)
GreenGasEmissionsG20.head()
Out[25]:
Country Year Green House Gases GreenGasEmissions (Million tonnes)
0 Argentina 1985 SF₆ gases (Million tonnes) 0.14960
1 Argentina 1985 PFC gases (Million tonnes) 1.94060
2 Argentina 1985 HFC gases (Million tonnes) 0.20630
3 Argentina 1985 Nitrous oxide (N₂O) (Million tonnes) 35.30528
4 Argentina 1985 Methane (CH₄) (Million tonnes) 94.61690

4.0 Guiding Questions:

One of the main concerns faced globally is the amount of fossil fuels burnt and consequently the amount of air pollutants released into the atmosphere by burning these fossil fuels. Many types of health ailments and consequently deaths are linked to the extent of exposure to air pollutants. “Collectively, the G20 economies account for around 90% of the gross world product (GWP), 80% of world trade (or, if excluding EU intra-trade, 75%), two-thirds of the world population, and approximately half of the world land area.”[5] Thus, the global economy is majorly influenced by the G20 nations which are highly developed economies in the world and to meet their expanding need for energy, these countries are also the largest consumers of fossil fuels. Hence, we want to analyze how their economic growth influences the consumption of fossil fuels and air pollutants like CO and CO2 emissions. Another point of interest in this study is to measure how much of renewable resources the aforementioned counties are using for their energy needs and has this decreased the amount of air pollutants being released into the atmosphere. Below are some of the questions that we will be explaining by analyzing the data:

  • How the amount of air pollutants released by the G20 countries changed in the time period 1985 to 2015?
  • What are the different greenhouse gases released by G20 countries varied between 1985 and 2015?
  • How the mix of fossil fuels being used change between 1985 and 2015 in the G20 countries?
  • How the different fossil fuel consumption impacted the amount air pollutant emissions by these countries?
  • How the GDP grew in G20 countries between 1985 and 2015? Is there any impact of this on the fossil fuel consumption and air pollutant emissions by these countries?
  • How is the use of different renewable sources of energy varied in G20 countries in the time period 1985 and 2015 and how it impacted the greenhouse gasses emissions?

[5]“Information about G20 countries” https://en.wikipedia.org/wiki/G20

5.0 Visualize the change in air pollutants (CO2) released by the G20 countries in the time period of 1985 to 2017

5.1 Line plot without Log axis

In [26]:
fig = px.line(CO2emissionsG20, x='Year', y='Annual CO₂ emissions (Million tonnes)',color='Country',
              line_group='Country', hover_name='Country')
fig.update_layout(legend_orientation="h", legend=dict(x=-.1, y=1.3))
fig.show()

5.2 Line plot with log scale

In [27]:
fig = px.line(CO2emissionsG20, x='Year', y='Annual CO₂ emissions (Million tonnes)', line_group='Country', color='Country',
              hover_name='Country')
fig.update_layout(xaxis_type="log", yaxis_type="log", legend_orientation="h", legend=dict(x=-.1, y=1.3))
fig.show()

5.3 Analysis:

Line graph shows trend of (CO2) emissions by different countries over the period of 1985 to 2017. Annual (CO2) emissions is in Million Tonnes which is on y-axis with years on x-axis. We have plotted a normal line graph and another one with log axis transformation, which allows us to visualize the trend for non-dominant countries which would otherwise be not visible clearly. In this plot multiple trends are be observed, US starts as the top most country in (CO2) emissions in 1985 and continues to increse until 2007 and from then it sligthly decreases and stabilizes at around 5,500 million tonnes with second place in 2017 whereas Russia starts in second place in 1985 and continues to grow a little unitl 1990 and then it drops for the next decade until 2000, after that it stabilizes but again slowly climbs to 1692 million tonnes (4th place) in 2014. China on the other hand starts at 3rd place with 1951 million tonnes and climbs to top position with 9838 million tonnes in 2014, which is close to 4 times of its 1985 value and double the US value in 2017. India starts at 7th place with 424 million tonnes in 1985 and climbs to 3rd position with 2466 million tonnes, close to 6 times from 1985 value. This is because of the rapid industrialization of India and China.

6.0. visualization of different greenhouse gases released by the G20 countries the time period of 1985 to 2014

6.1 Without log scale

In [28]:
fig = px.line(GreenGasEmissionsG20, x='Year', y='GreenGasEmissions (Million tonnes)', color='Green House Gases',
              line_group='Country', hover_name='Country')
fig.update_layout(legend_orientation="h", legend=dict(x=-.1, y=1.1))
fig.show()

6.2 With log scale

In [29]:
fig = px.line(GreenGasEmissionsG20, x='Year', y='GreenGasEmissions (Million tonnes)', color='Green House Gases',
              line_group='Country', hover_name='Country')
fig.update_layout(xaxis_type="log", yaxis_type="log", legend_orientation="h", legend=dict(x=-.1, y=1.1))
fig.show()

6.3 Analysis:

Line graph shows trend of different greenhouse gas emissions by different countries over the period of 1985 to 2014.

  • SF6 emissions: This gas is not measured every year, so it has less data points and used the interpolate function to plot the graph. US is in the Top spot in 1990 with 42 million tonnes and drops to second position in 2010 with 41 million tonnes, where as China is in 11th position in 1990 with 1.7 million tonnes and climbs to top position in 2010 with 57 million tonnes.
  • PFC gas: This gas also is not measured every year. US is in top position with 20.8 million tonnes in 1990 and drops to fourth position with 6.36 million tonnes in 2010 where as Russia is in second place in 1990 with 15.8 million tonnes and climbs to top position with 20.5 million tonnes in 2010.
  • HFC gas: This gas also is not measured every year. US is in top position with 29.18 million tonnes in 1990 and continues to stay in top position with 300 million tonnes in 2010 where as China is in third place in 1990 with 5.97 million tonnes and climbs to second position with 183.8 million tonnes in 2010.
  • Nitrous Oxide(N2O) gas: China is in top position with 908 million tonnes in 1985 and continues to stay in top position with 1752 million tonnes in 2014 where as India is in fourth place in 1985 with 480 million tonnes and climbs to second position with 636 million tonnes in 2014.
  • Carbon dioxide (CO2): since this gas is considered more important and is analyzed seperately in previous plot, so omiting this.

7.0 Visualization of different fossil fuels being used by the G20 countries from 1985 to 2015

In [30]:
#creating a new dataframe for the year 1985,2000 and 2015.This dataframe will only have values of  the FossilFuelConsumptionG20
#when the column year is equal to 1985,2000 and 2015
FossilFuelConsumptionG20_2015=FossilFuelConsumptionG20[FossilFuelConsumptionG20.Year==2015]
FossilFuelConsumptionG20_2000=FossilFuelConsumptionG20[FossilFuelConsumptionG20.Year==2000]
FossilFuelConsumptionG20_1985=FossilFuelConsumptionG20[FossilFuelConsumptionG20.Year==1985]

7.1 Visualization for year 1985

In [31]:
#suprimposing imformation of the same dataframe(i.e different fossil fuel type consumptions) 
#using the go.figure method from the plotly library
fig = go.Figure(go.Bar(x =FossilFuelConsumptionG20_1985.Country, y=FossilFuelConsumptionG20_1985['Oil (terawatt-hours)'], name='Oil (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_1985.Country, y=FossilFuelConsumptionG20_1985['Gas (terawatt-hours)'], name='Gas (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_1985.Country, y=FossilFuelConsumptionG20_1985['Coal (terawatt-hours)'], name='Coal (terawatt-hours)'))
#using the stack argument in the fig.layout mode which, stacks the information in one bar
fig.update_layout(barmode='stack', xaxis={'categoryorder':'category ascending'},title_text='G20 Fossil Fuels Usage in the year 1985',yaxis=dict(
        title='Terawatt-hours'))
fig.show()

7.2 Visualization for year 2000

In [32]:
#same idea as before, except for dataframe 2000(i.e values that are only for the year 2000)
fig = go.Figure(go.Bar(x =FossilFuelConsumptionG20_2000.Country, y=FossilFuelConsumptionG20_2000['Oil (terawatt-hours)'], name='Oil (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_2000.Country, y=FossilFuelConsumptionG20_2000['Gas (terawatt-hours)'], name='Gas (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_2000.Country, y=FossilFuelConsumptionG20_2000['Coal (terawatt-hours)'], name='Coal (terawatt-hours)'))

fig.update_layout(barmode='stack', xaxis={'categoryorder':'category ascending'},title_text='G20 Fossil Fuels Usage in the year 2000',yaxis=dict(
        title='Terawatt-hours'))
fig.show()

7.3 Visualization for year 2015

In [33]:
#same idea as before, except for dataframe 2015(i.e values that are only for the year 2015)
fig = go.Figure(go.Bar(x =FossilFuelConsumptionG20_2015.Country, y=FossilFuelConsumptionG20_2015['Oil (terawatt-hours)'], name='Oil (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_2015.Country, y=FossilFuelConsumptionG20_2015['Gas (terawatt-hours)'], name='Gas (terawatt-hours)'))
fig.add_trace(go.Bar(x=FossilFuelConsumptionG20_2015.Country, y=FossilFuelConsumptionG20_2015['Coal (terawatt-hours)'], name='Coal (terawatt-hours)'))

fig.update_layout(barmode='stack', xaxis={'categoryorder':'category ascending'},title_text='G20 Fossil Fuels Usage in the year 2015',yaxis=dict(
        title='Terawatt-hours'))
fig.show()

7.4 Analysis:

The bar graph shows trend of different fossil fuel consumption different countries in the years 1985, 2000, and 2015. US start as the top fossil fuel usage country in 1985 with 18,566 terawatt-hours total comprising of 8256, 5188, and 5122 terawatt-hours of Oil, Gas, and Coal respectively. China on the otherhand starts at third position with 5918 terawatt-hours total fossil fuel comprising of 1043, 139, and 4736 terawatt-hours of Oil, Gas, and Coal respectively. In 2015 China becomes top fossil fuel user with 30827 terawatt-hours total fossil fuels consumption comprising of 6534, 2038, and 22255 terawatt-hours of Oil, Gas, and Coal respectively whereas the US is in second spot with 22779 terawatt-hours total fossil fuels consumption comprising of 9960, 8263, and 4556 terawatt-hours of Oil, Gas, and Coal respectively. It is double whammy, that China not only became the top fossil fuel user but also the composition is alarming because the amount Coal is used.

8.0 Visualizing the impact of different fossil fuel consumption on the amount of air pollutant (CO2) emissions by the G20 countries in 1992, 2002 and 2014 using (Sankey diagram)

8.1 Data Wrangling for creating Sankey Diagram.

CO2bySourceG20 and FossilFuelConsumptionG20 datasets will be used in this visualization. Since the units in the CO2bySourceG20 dataset is tonnes of CO2 emissions by sources like Oil, Coal and Gas whereas the units in dataset FossilFuelConsumptionG20 is (terawatt-hours). To overcome this issue we calculated percentages of the individual contribution to the total, like how much percentage of CO2 emissions is from burning Oil, Coal, and Gas by each country and year. Similarly how much percentage of Oil, Gas, and Coal is burned by individual country that year. For example, in 2014 47.88% of the total CO2 emissions by Argentina is by burning Oil, where total CO2 emissions is by burning Oil, Coal, and Gas in 2014. Similarly CO2 emissions by burning Gas was 49.16% and Coal was 2.96% in 2014 for Argentina. On Fossil fuel consumption similarly percentages are 41.62%, 56.40%, and 1.98% for Oil, Gas, and Coal for Argentina in 2014.

Displaying datasets

In [34]:
CO2bySourceG20.head()
Out[34]:
Country Code Year Oil (tonnes) Coal (tonnes) Gas (tonnes)
0 Argentina ARG 1985 60437680 2773648 28810032.0
1 Argentina ARG 1986 62200064 3392864 30964464.0
2 Argentina ARG 1987 66453968 3777584 36698624.0
3 Argentina ARG 1988 66230464 4023072 42231264.0
4 Argentina ARG 1989 59752512 4143984 44609200.0
In [35]:
FossilFuelConsumptionG20.head()
Out[35]:
Country Code Year Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
0 Argentina ARG 1985 217.81827 160.62700 6.990557
1 Argentina ARG 1986 250.64976 184.53950 10.407628
2 Argentina ARG 1987 267.57141 186.74975 11.553994
3 Argentina ARG 1988 267.60630 217.68250 11.993011
4 Argentina ARG 1989 242.68321 232.16600 12.009020
In [36]:
CO2bySourceG20_P = CO2bySourceG20.drop(columns = ['Code'])
CO2bySourceG20_P['Total'] = CO2bySourceG20_P['Oil (tonnes)'] + CO2bySourceG20_P['Coal (tonnes)'] + CO2bySourceG20_P['Gas (tonnes)']
CO2bySourceG20_P['Oil (tonnes)'] = CO2bySourceG20_P['Oil (tonnes)']/CO2bySourceG20_P['Total']*100
CO2bySourceG20_P['Coal (tonnes)'] = CO2bySourceG20_P['Coal (tonnes)']/CO2bySourceG20_P['Total']*100
CO2bySourceG20_P['Gas (tonnes)'] = CO2bySourceG20_P['Gas (tonnes)']/CO2bySourceG20_P['Total']*100
CO2bySourceG20_P.drop(columns = ['Total'], inplace=True)
CO2bySourceG20_P.index = [CO2bySourceG20_P.Country, CO2bySourceG20_P.Year]
CO2bySourceG20_P = CO2bySourceG20_P[['Oil (tonnes)', 'Gas (tonnes)', 'Coal (tonnes)']]
CO2bySourceG20_1992 = CO2bySourceG20_P.xs(1992, level=1, axis=0, drop_level=False)
CO2bySourceG20_2002 = CO2bySourceG20_P.xs(2002, level=1, axis=0, drop_level=False)
CO2bySourceG20_2014 = CO2bySourceG20_P.xs(2014, level=1, axis=0, drop_level=False)
CO2bySourceG20_2014.head()
Out[36]:
Oil (tonnes) Gas (tonnes) Coal (tonnes)
Country Year
Argentina 2014 47.883149 49.156429 2.960422
Australia 2014 33.744441 20.383792 45.871767
Brazil 2014 69.078004 15.912283 15.009713
Canada 2014 47.084550 38.545404 14.370046
China 2014 13.969367 3.910653 82.119980
In [37]:
Sankeydf = FossilFuelConsumptionG20.drop(columns = ['Code'])
Sankeydf.index = [Sankeydf.Country, Sankeydf.Year]
Sankeydf['Total'] = Sankeydf['Oil (terawatt-hours)'] + Sankeydf['Gas (terawatt-hours)'] + Sankeydf['Coal (terawatt-hours)']
Sankeydf['Oil (terawatt-hours)'] = Sankeydf['Oil (terawatt-hours)']/Sankeydf['Total']*100
Sankeydf['Gas (terawatt-hours)'] = Sankeydf['Gas (terawatt-hours)']/Sankeydf['Total']*100
Sankeydf['Coal (terawatt-hours)'] = Sankeydf['Coal (terawatt-hours)']/Sankeydf['Total']*100
Sankeydf.drop(columns = ['Total','Country', 'Year'], inplace=True)
Sankeydf.head()
Out[37]:
Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
Country Year
Argentina 1985 56.512201 41.674123 1.813676
1986 56.250339 41.414001 2.335660
1987 57.434145 40.085793 2.480062
1988 53.813812 43.774475 2.411713
1989 49.846792 47.686572 2.466636

8.2 Slicing the dataset for 1992, 2002, and 2014 data

In [38]:
Sankeydf_1992 = Sankeydf.xs(1992, level=1, axis=0, drop_level=False)
Sankeydf_2002 = Sankeydf.xs(2002, level=1, axis=0, drop_level=False)
Sankeydf_2014 = Sankeydf.xs(2014, level=1, axis=0, drop_level=False)
Sankeydf_2014.head()
Out[38]:
Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
Country Year
Argentina 2014 41.618893 56.402805 1.978302
Australia 2014 38.434226 27.509126 34.056648
Brazil 2014 73.940911 17.456227 8.602861
Canada 2014 47.591765 43.300711 9.107524
China 2014 19.908048 6.393669 73.698284
In [71]:
Sankeydf_2002.head()
Out[71]:
Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours)
Country Year
Argentina 2002 38.124068 60.691806 1.184126
Australia 2002 36.125434 18.549505 45.325060
Brazil 2002 79.162163 10.548240 10.289597
Canada 2002 45.334512 39.208462 15.457027
China 2002 22.775738 2.494227 74.730035

8.3 displaying the values for Sankey diagram

In [39]:
display(Sankeydf_2002.stack().values)
display(CO2bySourceG20_2002.stack().values)
array([3.81240684e+01, 6.06918057e+01, 1.18412591e+00, 3.61254343e+01,
       1.85495054e+01, 4.53250603e+01, 7.91621632e+01, 1.05482397e+01,
       1.02895971e+01, 4.53345117e+01, 3.92084617e+01, 1.54570266e+01,
       2.27757376e+01, 2.49422732e+00, 7.47300351e+01, 6.48823802e+01,
       2.61909126e+01, 8.92670713e+00, 4.42091375e+01, 2.60456405e+01,
       2.97452220e+01, 3.63869469e+01, 7.97913246e+00, 5.56339206e+01,
       5.34487719e+01, 3.05450679e+01, 1.60061602e+01, 5.63875021e+01,
       3.52830847e+01, 8.32941319e+00, 5.90862738e+01, 1.58066294e+01,
       2.51070968e+01, 6.05729168e+01, 3.06130727e+01, 8.81401043e+00,
       2.18256329e+01, 5.96495964e+01, 1.85247707e+01, 6.05173174e+01,
       3.94730915e+01, 9.59111809e-03, 2.42039626e+01, 9.86353507e-01,
       7.48096839e+01, 6.01168460e+01, 1.18644281e+01, 2.80187259e+01,
       4.69228407e+01, 2.37771785e+01, 2.92999808e+01, 3.92075924e+01,
       4.28889257e+01, 1.79034819e+01, 4.35681614e+01, 2.92137898e+01,
       2.72180488e+01])
array([47.43364972, 51.65951273,  0.90683756, 26.67884879, 14.57395201,
       58.7471992 , 74.99910906,  9.18378256, 15.81710837, 44.06296873,
       33.03622938, 22.90080189, 18.7353502 ,  1.54891116, 79.71573864,
       62.69670815, 23.49658772, 13.80670413, 38.48597404, 21.32812782,
       40.18589813, 28.65073218,  4.74493748, 66.60433035, 56.0522732 ,
       21.35842238, 22.58930442, 57.37809331, 30.60787744, 12.01402925,
       54.10416796, 12.86926364, 33.0265684 , 67.65373514, 23.49907988,
        8.84718499, 21.33322703, 49.14027827, 29.5264947 , 69.80351482,
       30.19648518,  0.        ,  6.3758494 ,  1.17069211, 92.45345848,
       48.25377421, 11.09194584, 40.65427995, 42.05156994, 17.81524643,
       40.13318363, 36.13806458, 37.89638089, 25.96555453, 41.22000959,
       21.9249355 , 36.85505491])

8.4 Creating the values for the Sankey diagram by concatinating the two arrays

In [40]:
SankeyValues1992 = np.concatenate((Sankeydf_1992.stack().values, CO2bySourceG20_1992.stack().values))
SankeyValues2002 = np.concatenate((Sankeydf_2002.stack().values, CO2bySourceG20_2002.stack().values))
SankeyValues2014 = np.concatenate((Sankeydf_2014.stack().values, CO2bySourceG20_2014.stack().values))
SankeyValues2002
Out[40]:
array([3.81240684e+01, 6.06918057e+01, 1.18412591e+00, 3.61254343e+01,
       1.85495054e+01, 4.53250603e+01, 7.91621632e+01, 1.05482397e+01,
       1.02895971e+01, 4.53345117e+01, 3.92084617e+01, 1.54570266e+01,
       2.27757376e+01, 2.49422732e+00, 7.47300351e+01, 6.48823802e+01,
       2.61909126e+01, 8.92670713e+00, 4.42091375e+01, 2.60456405e+01,
       2.97452220e+01, 3.63869469e+01, 7.97913246e+00, 5.56339206e+01,
       5.34487719e+01, 3.05450679e+01, 1.60061602e+01, 5.63875021e+01,
       3.52830847e+01, 8.32941319e+00, 5.90862738e+01, 1.58066294e+01,
       2.51070968e+01, 6.05729168e+01, 3.06130727e+01, 8.81401043e+00,
       2.18256329e+01, 5.96495964e+01, 1.85247707e+01, 6.05173174e+01,
       3.94730915e+01, 9.59111809e-03, 2.42039626e+01, 9.86353507e-01,
       7.48096839e+01, 6.01168460e+01, 1.18644281e+01, 2.80187259e+01,
       4.69228407e+01, 2.37771785e+01, 2.92999808e+01, 3.92075924e+01,
       4.28889257e+01, 1.79034819e+01, 4.35681614e+01, 2.92137898e+01,
       2.72180488e+01, 4.74336497e+01, 5.16595127e+01, 9.06837555e-01,
       2.66788488e+01, 1.45739520e+01, 5.87471992e+01, 7.49991091e+01,
       9.18378256e+00, 1.58171084e+01, 4.40629687e+01, 3.30362294e+01,
       2.29008019e+01, 1.87353502e+01, 1.54891116e+00, 7.97157386e+01,
       6.26967081e+01, 2.34965877e+01, 1.38067041e+01, 3.84859740e+01,
       2.13281278e+01, 4.01858981e+01, 2.86507322e+01, 4.74493748e+00,
       6.66043303e+01, 5.60522732e+01, 2.13584224e+01, 2.25893044e+01,
       5.73780933e+01, 3.06078774e+01, 1.20140292e+01, 5.41041680e+01,
       1.28692636e+01, 3.30265684e+01, 6.76537351e+01, 2.34990799e+01,
       8.84718499e+00, 2.13332270e+01, 4.91402783e+01, 2.95264947e+01,
       6.98035148e+01, 3.01964852e+01, 0.00000000e+00, 6.37584940e+00,
       1.17069211e+00, 9.24534585e+01, 4.82537742e+01, 1.10919458e+01,
       4.06542799e+01, 4.20515699e+01, 1.78152464e+01, 4.01331836e+01,
       3.61380646e+01, 3.78963809e+01, 2.59655545e+01, 4.12200096e+01,
       2.19249355e+01, 3.68550549e+01])

8.5 Sankey Diagram for year 1992

In [41]:
data = dict(type='sankey', valueformat = ".2f", valuesuffix = "%", node = dict(pad = 15,\
            thickness = 20, line = dict(color = "black", width = 0.5),
      #label the nodes of the Sankey diagram
      label = ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany', 'India', 'Indonesia', 'Italy',\
               'Japan', 'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'South Korea', 'Turkey', 'United Kingdom', \
               'United States', "Oil", "Gas", "Coal", 'Argentina CO2 Emissions', 'Australia CO2 Emissions', \
               'Brazil CO2 Emissions', 'Canada CO2 Emissions', 'China CO2 Emissions', 'France CO2 Emissions', \
               'Germany CO2 Emissions', 'India CO2 Emissions', 'Indonesia CO2 Emissions', 'Italy CO2 Emissions', \
               'Japan CO2 Emissions', 'Mexico CO2 Emissions', 'Russia CO2 Emissions', 'Saudi Arabia CO2 Emissions', \
               'South Africa CO2 Emissions', 'South Korea CO2 Emissions', 'Turkey CO2 Emissions', \
               'United Kingdom CO2 Emissions', 'United States CO2 Emissions'],
      #assign colors to the nodes
      color = ['darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta', 'darkorchid', 'darkolivegreen', 'orangered', 'blue',\
               'brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', 'coral', 'cornflowerblue','cornsilk', \
               'crimson', 'darkkhaki','green','tomato', 'yellow','darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta',\
               'darkorchid', 'darkolivegreen', 'orangered', 'blue','brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', \
               'coral', 'cornflowerblue','cornsilk','crimson', 'darkkhaki']),
    #sets the target(Fossil Fuels) to the sources(G20 Countries) and give tragets their value(percentage share)
    link = dict(
      source = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12,13,13,13,14,14,14,15,\
                15,15,16,16,16,17,17,17,18,18,18,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,\
                20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21],
      target = [19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,\
                19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26,26,\
                27,27,27,28,28,28,29,29,29,30,30,30,31,31,31,32,32,32,33,33,33,34,34,34,35,35,35,36,36,36,37,37,37,38,38,38,\
                39,39,39,40,40,40],
      value = SankeyValues1992))
#sets the layout for the plot
layout =  go.Layout(title = "Fossil Fuels Consumption and CO2 Emissions by G20 Countries (1992)", font = dict(size = 15))
fig = go.Figure(data=[data], layout=layout)
py.iplot(fig, validate=False) #plots the diagram

8.6 Sankey Diagram for year 2002

In [42]:
data = dict(type='sankey', valueformat = ".2f", valuesuffix = "%", node = dict(pad = 15,\
            thickness = 20, line = dict(color = "black", width = 0.5),
      #label the nodes of the Sankey diagram
      label = ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany', 'India', 'Indonesia', 'Italy',\
               'Japan', 'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'South Korea', 'Turkey', 'United Kingdom', \
               'United States', "Oil", "Gas", "Coal", 'Argentina CO2 Emissions', 'Australia CO2 Emissions', \
               'Brazil CO2 Emissions', 'Canada CO2 Emissions', 'China CO2 Emissions', 'France CO2 Emissions', \
               'Germany CO2 Emissions', 'India CO2 Emissions', 'Indonesia CO2 Emissions', 'Italy CO2 Emissions', \
               'Japan CO2 Emissions', 'Mexico CO2 Emissions', 'Russia CO2 Emissions', 'Saudi Arabia CO2 Emissions', \
               'South Africa CO2 Emissions', 'South Korea CO2 Emissions', 'Turkey CO2 Emissions', \
               'United Kingdom CO2 Emissions', 'United States CO2 Emissions'],
      #assign colors to the nodes
      color = ['darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta', 'darkorchid', 'darkolivegreen', 'orangered', 'blue',\
               'brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', 'coral', 'cornflowerblue','cornsilk', \
               'crimson', 'darkkhaki','green','tomato', 'yellow','darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta',\
               'darkorchid', 'darkolivegreen', 'orangered', 'blue','brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', \
               'coral', 'cornflowerblue','cornsilk','crimson', 'darkkhaki']),
    #sets the target(Fossil Fuels) to the sources(G20 Countries) and give tragets their value(percentage share)
    link = dict(
      source = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12,13,13,13,14,14,14,15,\
                15,15,16,16,16,17,17,17,18,18,18,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,\
                20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21],
      target = [19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,\
                19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26,26,\
                27,27,27,28,28,28,29,29,29,30,30,30,31,31,31,32,32,32,33,33,33,34,34,34,35,35,35,36,36,36,37,37,37,38,38,38,\
                39,39,39,40,40,40],
      value = SankeyValues2002))
#sets the layout for the plot
layout =  go.Layout(title = "Fossil Fuels Consumption and CO2 Emissions by G20 Countries (2002)", font = dict(size = 15))
fig = go.Figure(data=[data], layout=layout)
py.iplot(fig, validate=False) #plots the diagram

8.7 Sankey Diagram for year 2014

In [43]:
data = dict(type='sankey', valueformat = ".2f", valuesuffix = "%", node = dict(pad = 15,\
            thickness = 20, line = dict(color = "black", width = 0.5),
      #label the nodes of the Sankey diagram
      label = ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany', 'India', 'Indonesia', 'Italy',\
               'Japan', 'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'South Korea', 'Turkey', 'United Kingdom', \
               'United States', "Oil", "Gas", "Coal", 'Argentina CO2 Emissions', 'Australia CO2 Emissions', \
               'Brazil CO2 Emissions', 'Canada CO2 Emissions', 'China CO2 Emissions', 'France CO2 Emissions', \
               'Germany CO2 Emissions', 'India CO2 Emissions', 'Indonesia CO2 Emissions', 'Italy CO2 Emissions', \
               'Japan CO2 Emissions', 'Mexico CO2 Emissions', 'Russia CO2 Emissions', 'Saudi Arabia CO2 Emissions', \
               'South Africa CO2 Emissions', 'South Korea CO2 Emissions', 'Turkey CO2 Emissions', \
               'United Kingdom CO2 Emissions', 'United States CO2 Emissions'],
      #assign colors to the nodes
      color = ['darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta', 'darkorchid', 'darkolivegreen', 'orangered', 'blue',\
               'brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', 'coral', 'cornflowerblue','cornsilk', \
               'crimson', 'darkkhaki','green','tomato', 'yellow','darkgreen', 'deepskyblue', 'aqua', 'darksalmon', 'darkmagenta',\
               'darkorchid', 'darkolivegreen', 'orangered', 'blue','brown', 'burlywood', 'cadetblue','chartreuse', 'chocolate', \
               'coral', 'cornflowerblue','cornsilk','crimson', 'darkkhaki']),
    #sets the target(Fossil Fuels) to the sources(G20 Countries) and give tragets their value(percentage share)
    link = dict(
      source = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12,13,13,13,14,14,14,15,\
                15,15,16,16,16,17,17,17,18,18,18,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,\
                20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21],
      target = [19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,\
                19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,19,20,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26,26,\
                27,27,27,28,28,28,29,29,29,30,30,30,31,31,31,32,32,32,33,33,33,34,34,34,35,35,35,36,36,36,37,37,37,38,38,38,\
                39,39,39,40,40,40],
      value = SankeyValues2014))
#sets the layout for the plot
layout =  go.Layout(title = "Fossil Fuels Consumption and CO2 Emissions by G20 Countries (2014)", font = dict(size = 15))
fig = go.Figure(data=[data], layout=layout)
py.iplot(fig, validate=False) #plots the diagram

8.8 Analysis:

Sankey diagram allows to analyze the flow. Here we are using it analyze the impact of different fossil fuel consumption on the amount of air pollutant (CO2) emissions by the G20 countries in 1992, 2002 and 2014. We want to compare the changes every decade and picked these particular years due to limitation of data availability. In the Sankey diagram left side list of G20 countries represent the total of Oil, gas and Coal fossil fuel consumption represented by 100%, this is split between the Gas, Oil, and Coal based on these fossil fuel consumption by that country in that year. Similarly on the right side shows the 100% of each country's CO2 emissions and they map to the center Oil, Gas and Coal based on the source of these CO2 emissions. Of these fossil fuels Coal produces most CO2 emissions and considered bad for environment.[6] By comparing the these sankey diagrams US mix fossil fuel consumption was 27.94%, 44.47%, and 27.59%, respectively for Gas, Oil, and Coal in 1992. Which changed to 29.25%, 43.02%, and 27.73% in 2002 and 34.82%, 42.29%, and 22.89% in 2014. It is evident that US is trying to limit the consumption of Coal and move towards more cleaner fuel by reducing Coal consumption and increasing Gas consumption. Similar trend can be found in other countries .Also, when it comes to CO2 emissions percentages for US are 22.01%, 41.24%, and 36.75% in 1992, where as 21.92%, 41.22%, and 36.85% in 2002 and 27.56%, 40.68%, and 31.75% in 2014. From this it is evident that as Coal consumption decreased so does the CO2 emissions from Coal. In absolute figures US CO2 emissions in Million Tonnes in the years 1992, 2002, and 2014 are 4905.51908, 5636.695938, 5249.982734.

[6]Pounds of CO2 emitted per million British thermal units (Btu) of energy for various fuels

9.0 Visualization of the GDP growth of the G20 countries from 1985 to 2015, and their impact on fossil fuel consumption and air pollutant emissions

In [44]:
#defining a new dataframe, which is based on an old dataframe 
G20_fossilfuel_CO2=FossilFuelConsumptionG20
#obtaining the total fossil by summing each column and defining a new column called total fossil fuels
G20_fossilfuel_CO2['Total Fossil Fuels']=FossilFuelConsumptionG20['Oil (terawatt-hours)']+FossilFuelConsumptionG20['Gas (terawatt-hours)']\
+FossilFuelConsumptionG20['Coal (terawatt-hours)']
G20_fossilfuel_CO2.head()
Out[44]:
Country Code Year Oil (terawatt-hours) Gas (terawatt-hours) Coal (terawatt-hours) Total Fossil Fuels
0 Argentina ARG 1985 217.81827 160.62700 6.990557 385.435827
1 Argentina ARG 1986 250.64976 184.53950 10.407628 445.596888
2 Argentina ARG 1987 267.57141 186.74975 11.553994 465.875154
3 Argentina ARG 1988 267.60630 217.68250 11.993011 497.281811
4 Argentina ARG 1989 242.68321 232.16600 12.009020 486.858230
In [45]:
#dropped the columns that are not needed (inplace)
G20_fossilfuel_CO2.drop(['Oil (terawatt-hours)','Gas (terawatt-hours)','Coal (terawatt-hours)','Code'],axis=1,inplace=True)
G20_fossilfuel_CO2.head()
Out[45]:
Country Year Total Fossil Fuels
0 Argentina 1985 385.435827
1 Argentina 1986 445.596888
2 Argentina 1987 465.875154
3 Argentina 1988 497.281811
4 Argentina 1989 486.858230
In [46]:
#creating a multiple dataframes from the previously defined dataframe(using boolean indexing)
G20_fossilfuel_CO2_1985=G20_fossilfuel_CO2[G20_fossilfuel_CO2.Year==1985]
G20_fossilfuel_CO2_2000=G20_fossilfuel_CO2[G20_fossilfuel_CO2.Year==2000]
G20_fossilfuel_CO2_2015=G20_fossilfuel_CO2[G20_fossilfuel_CO2.Year==2015]
G20_fossilfuel_CO2_1985.head()
Out[46]:
Country Year Total Fossil Fuels
0 Argentina 1985 385.435827
32 Australia 1985 840.617817
64 Brazil 1985 791.910415
96 Canada 1985 1763.458106
128 China 1985 5919.825992
In [47]:
CO2emissionsG20.head()
Out[47]:
Country Code Year Annual CO₂ emissions (Million tonnes)
0 Argentina ARG 1985 100.366195
1 Argentina ARG 1986 103.933940
2 Argentina ARG 1987 114.611064
3 Argentina ARG 1988 121.129121
4 Argentina ARG 1989 116.802018
In [48]:
#creating new dataframes from previous dataframes (usiing boolean indexing) for C02 emissions
#also dropping the columns that are not needed
CO2emissionsG20_1985_1=CO2emissionsG20[CO2emissionsG20.Year==1985]
#CO2emissionsG20_1985_1.drop(['Code','Year'],axis=1,inplace=True)
CO2emissionsG20_1985_1=CO2emissionsG20_1985_1.drop(['Code','Year'],axis=1)
CO2emissionsG20_2000_1=CO2emissionsG20[CO2emissionsG20.Year==2000]
CO2emissionsG20_2000_1=CO2emissionsG20_2000_1.drop(['Code','Year'],axis=1)
CO2emissionsG20_2015_1=CO2emissionsG20[CO2emissionsG20.Year==2015]
CO2emissionsG20_2015_1=CO2emissionsG20_2015_1.drop(['Code','Year'],axis=1)
display(CO2emissionsG20_2000_1)
Country Annual CO₂ emissions (Million tonnes)
15 Argentina 141.716804
48 Australia 350.194582
81 Brazil 324.226040
114 Canada 572.530740
147 China 3349.294776
180 France 420.652709
213 Germany 900.960091
246 India 1029.637759
279 Indonesia 265.983184
312 Italy 470.767747
345 Japan 1262.734462
378 Mexico 397.899666
411 Russia 1499.616209
444 Saudi Arabia 296.353321
477 South Africa 377.730906
510 South Korea 445.440884
543 Turkey 226.029843
576 United Kingdom 567.061452
609 United States 6000.606067
In [49]:
#merged different dataframes on the column named Country, with the outer method(which uses the union of keys from both dataframes)
G20_fossilfuel_CO2_1985_1=pd.merge(G20_fossilfuel_CO2_1985,CO2emissionsG20_1985_1, on='Country', how='outer')
G20_fossilfuel_CO2_2000_1=pd.merge(G20_fossilfuel_CO2_2000,CO2emissionsG20_2000_1, on='Country', how='outer')
G20_fossilfuel_CO2_2015_1=pd.merge(G20_fossilfuel_CO2_2015,CO2emissionsG20_2015_1, on='Country', how='outer')
In [50]:
G20_fossilfuel_CO2_1985_1.head()
Out[50]:
Country Year Total Fossil Fuels Annual CO₂ emissions (Million tonnes)
0 Argentina 1985 385.435827 100.366195
1 Australia 1985 840.617817 240.987972
2 Brazil 1985 791.910415 179.936803
3 Canada 1985 1763.458106 421.675597
4 China 1985 5919.825992 1951.773228
In [51]:
#Creating a new dtaframe from previous years. The dataframe itself only has values of GDP per capita, which only has year 
#equal to 1985,2000 and 2015
GDPpercapitaG20_1985=GDPpercapitaG20[GDPpercapitaG20.Year==1985]
GDPpercapitaG20_2000=GDPpercapitaG20[GDPpercapitaG20.Year==2000]
GDPpercapitaG20_2015=GDPpercapitaG20[GDPpercapitaG20.Year==2015]
In [52]:
#dropping the not needed columns 
GDPpercapitaG20_1985.drop(['Code','Year'],axis=1,inplace=True)
GDPpercapitaG20_2000.drop(['Code','Year'],axis=1,inplace=True)
GDPpercapitaG20_2015.drop(['Code','Year'],axis=1,inplace=True)
C:\Users\marut\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py:3994: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [53]:
#Merging the previously defined G20_fossilfuel_CO2_1985_1 dataframe with the GPD dataframe (only for year 1985,2000,2015)
G20_fossilfuel_CO2_1985_1=pd.merge(G20_fossilfuel_CO2_1985_1,GDPpercapitaG20_1985, on='Country', how='outer')
G20_fossilfuel_CO2_2000_1=pd.merge(G20_fossilfuel_CO2_2000_1,GDPpercapitaG20_2000, on='Country', how='outer')
G20_fossilfuel_CO2_2015_1=pd.merge(G20_fossilfuel_CO2_2015_1,GDPpercapitaG20_2015, on='Country', how='outer')
In [54]:
G20_fossilfuel_CO2_1985_1.head()
Out[54]:
Country Year Total Fossil Fuels Annual CO₂ emissions (Million tonnes) GDP per capita (2011USD)
0 Argentina 1985 385.435827 100.366195 12313
1 Australia 1985 840.617817 240.987972 24206
2 Brazil 1985 791.910415 179.936803 5207
3 Canada 1985 1763.458106 421.675597 27219
4 China 1985 5919.825992 1951.773228 2310

9.1 Scatter plot for 1985

In [55]:
#graph that has a linear relationship between total fossil fuels used and annual CO2 emmisions
#the more total fossil fuels consumed the more annual co2 emissions produced by country. Also, the
#size of this grpah represents the GDP per capita per country
fig = px.scatter(G20_fossilfuel_CO2_1985_1, x="Total Fossil Fuels", y="Annual CO₂ emissions (Million tonnes)", color="Country",
                 size='GDP per capita (2011USD)', hover_data=['GDP per capita (2011USD)'],
title='GDP growth,and its respective impact on fossil fuel consumption and air pollutant emissions.(Year=1985)')
fig.show()

9.2 Scatter plot for 2000

In [56]:
#For year 2000
fig = px.scatter(G20_fossilfuel_CO2_2000_1, x="Total Fossil Fuels", y="Annual CO₂ emissions (Million tonnes)", color="Country",
                 size='GDP per capita (2011USD)', hover_data=['GDP per capita (2011USD)'],
title='GDP growth,and its respective impact on fossil fuel consumption and air pollutant emissions.(Year=2000)')
fig.show()

9.3 Scatter plot for 2015

In [57]:
#For year 2015
fig = px.scatter(G20_fossilfuel_CO2_2015_1, x="Total Fossil Fuels", y="Annual CO₂ emissions (Million tonnes)", color="Country",
                 size='GDP per capita (2011USD)', hover_data=['GDP per capita (2011USD)'],
title='GDP growth,and its respective impact on fossil fuel consumption and air pollutant emissions.(Year=2015)')
fig.show()

9.4 Analysis:

Scatter plot shows correlation betweeen the GDP growth of the G20 countries from 1985 to 2015 with their fossil fuel consumption and air pollutant emissions. In the plot Fossil fuel consumption is shown on x-axis and CO2 emissions on y-axis with GDP is represented by the size of the bubble.

  • In 1985 US has the highest GDP and also has the highest fossil fuel consumption and CO2 emissions. Russsia being in second place in all three variables, while China being third place.
  • In 2000 US contunes in top place but China gets the second spot. Russia econmy shrunk due the split of USSR and its impact can be seen from dropping in the position towards bottom left in the plot indicating less burning of fossil fuels and less CO2 emissions.
  • In 2015 China takes the top spot in fossil fuel consumption and CO2 emissions but stil the GDP is not big as US, this can be explained by the usage of more coal by China for CO2 emissions and population as the reason for consuming more fossil fuels.

10.0 Analyzing and Visualizing the usage of different renewable resources and its impact of Total greenhouse gas emissions

10.1 Data Wrangling for visualization

In [58]:
#dropping unwanted columns
RenewableEnergyConsumtionG20_Hydro=RenewableEnergyConsumtionG20.drop(['Code','Wind (terawatt-hours)','Solar (terawatt-hours)','Other renewables (modern biofuels; geothermal; wave & tidal) (terawatt-hours)'],axis=1)
RenewableEnergyConsumtionG20_Wind=RenewableEnergyConsumtionG20.drop(['Code','Hydropower (terawatt-hours)','Solar (terawatt-hours)','Other renewables (modern biofuels; geothermal; wave & tidal) (terawatt-hours)'],axis=1)
RenewableEnergyConsumtionG20_Solar=RenewableEnergyConsumtionG20.drop(['Code','Hydropower (terawatt-hours)','Wind (terawatt-hours)','Other renewables (modern biofuels; geothermal; wave & tidal) (terawatt-hours)'],axis=1)
RenewableEnergyConsumtionG20_Other=RenewableEnergyConsumtionG20.drop(['Code','Hydropower (terawatt-hours)','Wind (terawatt-hours)','Solar (terawatt-hours)'],axis=1)
In [59]:
#visualizing dataframe
RenewableEnergyConsumtionG20_Wind.head()
Out[59]:
Country Year Wind (terawatt-hours)
0 Argentina 1985 0.0
1 Argentina 1986 0.0
2 Argentina 1987 0.0
3 Argentina 1988 0.0
4 Argentina 1989 0.0
In [60]:
#setting the index to column and also year(i.e multi index , hirarchical)
RenewableEnergyConsumtionG20_Wind.set_index(['Country','Year'], inplace = True)
RenewableEnergyConsumtionG20_Hydro.set_index(['Country','Year'], inplace = True)
RenewableEnergyConsumtionG20_Solar.set_index(['Country','Year'], inplace = True)
RenewableEnergyConsumtionG20_Other.set_index(['Country','Year'], inplace = True)
In [61]:
#obtaining the total green gass emissions from the same dataframe(by the vectorized summation method)
GreenGasEmissionsG20_1['Total Greehouse Gasses(Million tonnes)']=GreenGasEmissionsG20_1['SF₆ gases (Million tonnes)']+GreenGasEmissionsG20_1['PFC gases (Million tonnes)']\
+GreenGasEmissionsG20_1['HFC gases (Million tonnes)']+GreenGasEmissionsG20_1['Nitrous oxide (N₂O) (Million tonnes)']+GreenGasEmissionsG20_1['Methane (CH₄) (Million tonnes)']\
+GreenGasEmissionsG20_1['Carbon Dioxide (CO₂) (Million tonnes)']
In [62]:
GreenGasEmissionsG20_1.drop(columns = ['Country', 'Year'], inplace = True)
In [63]:
#dropping unwanted columns
GreenGasEmissionsG20_21=GreenGasEmissionsG20_1.drop(['SF₆ gases (Million tonnes)','PFC gases (Million tonnes)','HFC gases (Million tonnes)',\
                            'Nitrous oxide (N₂O) (Million tonnes)','Methane (CH₄) (Million tonnes)',\
                            'Carbon Dioxide (CO₂) (Million tonnes)'],axis=1)
In [64]:
GreenGasEmissionsG20_21.head()
Out[64]:
Total Greehouse Gasses(Million tonnes)
Country Year
Argentina 1985 232.815491
1986 236.599303
1987 248.584445
1988 258.690892
1989 255.384037
In [65]:
#Mergeing dataframe on two indexes which are countries and year, using the inner method. 
#This method use intersection of keys from both frames
GreenGasEmissionsG20_21merged1=pd.merge(GreenGasEmissionsG20_21,RenewableEnergyConsumtionG20_Wind, on=['Country','Year'],how='inner')
In [66]:
#merging dataframes by the previously defined method.
#This is done to obtain the final dataframe(which will be used to create visualziation)
GreenGasEmissionsG20_2_hydro_1=pd.merge(GreenGasEmissionsG20_21merged1,RenewableEnergyConsumtionG20_Hydro, on=['Country','Year'],how='inner')
GreenGasEmissionsG20_2_solar_1=pd.merge(GreenGasEmissionsG20_2_hydro_1,RenewableEnergyConsumtionG20_Solar, on=['Country','Year'],how='inner')
RenewableEnergyConsumtionG20_Total=pd.merge(GreenGasEmissionsG20_2_solar_1,RenewableEnergyConsumtionG20_Other, on=['Country','Year'],how='inner')
In [67]:
#creating a new dataframe that stacks by country an reset its idex of country
RenewableEnergyConsumtionG20_Total_1=RenewableEnergyConsumtionG20_Total.stack().reset_index()
In [68]:
#renaming the columns of previously defined columns to use them in the figure.
RenewableEnergyConsumtionG20_Total_1.rename(columns={"level_2": "Parameter"}, inplace = True)
RenewableEnergyConsumtionG20_Total_1.rename(columns={0: "Values"}, inplace = True)
RenewableEnergyConsumtionG20_Total_1.head()
Out[68]:
Country Year Parameter Values
0 Argentina 1985 Total Greehouse Gasses(Million tonnes) 232.815491
1 Argentina 1985 Wind (terawatt-hours) 0.000000
2 Argentina 1985 Hydropower (terawatt-hours) 20.656276
3 Argentina 1985 Solar (terawatt-hours) 0.000000
4 Argentina 1985 Other renewables (modern biofuels; geothermal;... 0.136371

10.2 Line Plot for Total green house gases and renewable energy

In [69]:
#Plotting figure without log axis
fig = px.line(RenewableEnergyConsumtionG20_Total_1, x='Year', y='Values',color='Parameter',
              line_group='Country', hover_name='Country')
fig.update_layout(legend_orientation="h")
fig.show()
In [70]:
#Plotting figure wih log axis
fig = px.line(RenewableEnergyConsumtionG20_Total_1, x='Year', y='Values',color='Parameter',
              line_group='Country', hover_name='Country')
fig.update_layout(xaxis_type="log", yaxis_type="log", legend_orientation="h")
fig.show()

10.3 Analysis:

The line graph shows the trend of renewable energy consumption of the G20 countries from 1985 to 2014 with their greenhouse gas emissions. In the plot year is shown on x-axis and greenhouse gas emissions and renewable energy consumption on y-axis. Our obervations are given below:

  • Many countries started exploring the renewable energy sources recently, for example Indonesia started using the wind energy in 2007, this is evident from the plot line starting from 2007. Similarly Argentina started using solar energy in 1999 whereas Indonesia in 2008 and SaudiArabia in 2011.
  • Even though China is the top consumer of the renewable energy its CO2 and other greenhouse emissions are highest because of the amount of fossil fuels used and especially the mix of fossil fuel with coal being the highest part of the fossil fuels being burnt are the reasons. Other important reason being the population of the being the highest among all countries.
  • As countries mix the fossil fuels changes from more coal based to more gas based and most importantly when renewable energy becomes the significant portion of the energy mix of the country you will observe the decreasing trend in greenhouse gases. For example as Germany starts using more renewable energy their greenhouse gas emissions reduces.

11. Conclusion:

Overall, as countries grow economically (GDP), their energy demand expands and as a result consumption of fossil fuels increases. Consequently, more green house gases including CO2 are released. These green house gases are harmful to the humans, animals and the environment. Using more renewable energy sources decreses the dependency on fossil fuels to meet the ever increasing energy demand and decreases the green house gas emissions. Hence, all coutries should try to increase their renewable energy source percentage in their energy portfolio.

12. Further Analysis

Include more variables (Datasets) such as population,forest area, and vegetation to analyze and emphasize the effect of using more renewable sources of energy in reducing the green house gases.